Certifiable Distributional Robustness with Principled Adversarial Training

نویسندگان

  • Aman Sinha
  • Hongseok Namkoong
  • John C. Duchi
چکیده

Neural networks are vulnerable to adversarial examples and researchers have proposed manyheuristic attack and defense mechanisms. We take the principled view of distributionally ro-bust optimization, which guarantees performance under adversarial input perturbations. Byconsidering a Lagrangian penalty formulation of perturbation of the underlying data distribu-tion in a Wasserstein ball, we provide a training procedure that augments model parameterupdates with worst-case perturbations of training data. For smooth losses, our procedure prov-ably achieves moderate levels of robustness with little computational or statistical cost relativeto empirical risk minimization. Furthermore, our statistical guarantees allow us to efficientlycertify robustness for the population loss. For imperceptible perturbations, our method matchesor outperforms heuristic approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Certifying Some Distributional Robustness with Principled Adversarial Training

Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasse...

متن کامل

Distributional Smoothing by Virtual Adversarial Examples

Smoothness regularization is a popular method to decrease generalization error. We propose a novel regularization technique that rewards local distributional smoothness (LDS), a KLdistance based measure of the model’s robustness against perturbation. The LDS is defined in terms of the direction to which the model distribution is most sensitive in the input space. We call the training with LDS r...

متن کامل

Towards Deep Learning Models Resistant to Adversarial Attacks

Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides a broad and unifying view on much of the prior work...

متن کامل

Ularized with a Unified Embedding

Injecting adversarial examples during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we first show iteratively generated adversarial images easily transfer between networks trained with the same strategy. Inspired by this observation, we propose cascade adversarial training, which transf...

متن کامل

Cascade Adversarial Machine Learning Regularized with a Unified Embedding

Deep neural network classifiers are vulnerable to small input perturbations carefully generated by the adversaries. Injecting adversarial inputs during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we propose to utilize embedding space for both classification and low-level (pixel-level)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.10571  شماره 

صفحات  -

تاریخ انتشار 2017